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Abstract: Weighted kernel-density estimates (wKDE) are broadly used in 
many statistical areas, for instant, density estimation under right-censoring. 
However, bandwidth selection could be a problem by reweighting the ker- 
nels. In this paper, we investigate the methods of bandwidth selection for 
wKDE. Three mean integrated squared error based bandwidth selection 
methods are introduced. The least-squares cross-validation method, the 
adaptive weight kernel density estimator and boundary problems are also 
studied. Monte Carlo simulations were conducted to demonstrate the per- 
formance of the proposed bandwidth selection methods. Finally, the perfor- 
mance of wKDE is illustrated via an application to biased sampling problem 
and a real data application. 

Keywords and phrases: Biased Sampling, Informative Censoring, Right 
Censoring. 

1. Introduction 

Kernel-type density estimators are widely used due to their simple forms and 
smoothness. In survival analysis, weighted kernel estimation is an important 
method of estimating the density of the survival times and the hazard function 
for censoring data[TT]. Let X\, X 2 , ■ ■ -X n be a set of data. A general class of 
density estimators can be defined as reweighted weight function estimators, 

n 

f(x) = J2HXi)W(x,X i ), (1) 

i=l 

where W(.) is a weight function, which is usually taken to be a (symmetric) 
density function, and w(.) is a re-weighting function which can be adjusted to 
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control the roles of different data points in the sample. If we take 

1 / y — x \ 1 
W(x,y) = ~K[ — - — )=K h (y-x) and w(x)=—, 
h \ h J n 

we get a standard kernel density estimator, 



nh V h I ^-^ n 

i=l v 7 i=l 

where K{.) is a kernel function and h is the bandwidth which controls the 
smooth of the estimate. Often the re-weighting function w(.) is required to be 
non- negative and sum up to 1. It can be a function of X itself and/or a vector 
of covariates. For example, Gisbert used a function of the population of each 
country to reweight the kernel density estimate of per capita GDP [5]. Marron 
and Padgett used a weighting function s, which is defined to be the jump sizes at 
X of the product-limit (PL) estimator by Kaplan and Meier 10], to correct the 
random right-censoring bias (MP estimator hereafter) [T5] . In the MP estimator, 
s is a function of both X and the censoring variable A, which is defined to be 

. , . _ J 0, if Xi is censored, 

1 1, if Xi is not censored. 

Thus, the bias induced by the random right-censoring can be corrected by the 
following MP estimator, 

n 

fmpjx) = ^ SjKhjx - Xi). (3) 
i=l 

Throughout this paper, the following general form of weighted kernel density 
estimator will be used, 

n 

f(x)=Y / w(^,Z l )K h (x-X l ), (4) 

i=l 

where Z is the covariate(s). We simply choose K (.) to be a Gaussian kernel. For 
computational considerations, the Epanechnikov kernel can be used for large 
samples. Literature shows that there is very little to choose between various 
kernels on the basis of mean integrated square error. In this study, will focus 
on data with small or moderate sizes, so computation burden will not be a big 
issue. 

The bandwidth selection problem has been well studied and documented for 
the unweighted kernel density estimation. In Section[2]we will discuss the band- 
width selection problem for wKDE. Two rough estimators and a plug-in estima- 
tor of the optimal bandwidth are proposed. The least-squares cross-validation 
method will also be discussed to refine the three estimates from Section [2] and 
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to automatically select the bandwidth. The adaptive wKDE and the boundary 
problem will be discussed in Session [3] and Section 2] respectively. The perfor- 
mance of the proposed methods will be illustrated via simulation studies in 
Section [5l Section [6] consists of two applications: an application of estimating 
densities from biased sampling by wKDE and a real data application showing 
how to estimate from survival data subject to informative censoring. Section [7] 
concludes the paper with a brief summary. 

2. Bandwidth Selection 
2.1. Two rough approaches 

Bandwidth selection is one of the most important issues in kernel density es- 
timation. In this paper, we consider selecting bandwidths based on the mean 
integrated squared error (MISE) criteria. The MISE criteria was first proposed 
by Rosenblatt [20] and has been widely used for automatic bandwidth selection. 
The MISE of / can be decomposed into a sum of an integrated square bias term 
and an integrated variance term as below, 

MISE(f) = [ {Ef(x) - f{x)} 2 dx + [ varf{x)dx. (5) 



If we take equal weights for all data points such that w(.) — 1/n in (J4j) , the 
optimal bandwidth minimizing the MISE is 

h opt ^k~ 2/5 K(t) 2 dtj ' y f"(x) 2 dx} ' n- 1 / 5 , (6) 

where = J t 2 K(t)dt [21] ■ However, with unequal weights, the expectation of 
f(x) for each x is 

Ef(x) =J2 E Zi)K h (x - X t )] . (7) 

The bias in ([5]), Ef(x) — f(x), can be rewritten as 

/n fx — y \ 
w{y, z)-K [ — — \ f(y)dy - f(x) 



y—x — ht 



nw(x ~ ht, z)K(t)f(x - ht)dt - f(x) 
K(t)[n-w(x-ht,z)f(x)- f(x)]dt (8) 



-nhf'(x) J w(x-ht,z)tK(t)dt 
+!t!Lf"(x) t 2 w(x-ht,z)K(t)dt 



-0(h 3 ). 
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If w{.) 7^ 1/n, n ■ w(x — ht,z)f(x) — f(x) ^ 0, and the first term to the right 
hand side (RHS) of © won't be zero. Also, although the Gaussian kernel is used 
such that J tK(t)dt = 0, after the Taylor series expansion, the integral part in 
the second term to the RHS of ([8]) won't be zero either, even if w(.) is free of 
Z. Therefore, we can not rewrite the bias of / in form of the sum of a term of 
order h 2 and higher order terms as in [29l [24] . 

An easy and rough approximation is to expand and view the weighted sample 
as an new unweighted sample. For instance, let X = {X\, X2, ■ ■ ■ , X n } be the 
observed survival data with random right-censoring. If Xi is censored, the true 
survival time of the ith individual is Tj > Xi. Thus, we can view the censored 
survival data as a data set consists of observed event times and latented true 
survival times, with the same size and all data points have equal weights. Thus, 
we can continue to use h opt in ([6]) to select the optimal bandwidth. We can find 
that &2 and Kit) in ((6]) do not depend on the data. If n is kept unchanged, 
we need only figure out how to compute J f" 2 based on the weighted sample 
in computing h op t- For complete weighted data, n may be hard to determine, 
we will leave this problem for future studies. With additional assumptions, we 
propose the following two rough approach of h op t ■ 

Rough approach 1: In the same spirit of Parzen (1962) and Silverman 
(1986), we use a normal reference density to compute J f" 2 and approximate 
the optimal bandwidth roughly by 

h n = 0.9 1/5 , (9) 

where 

A — min(s w , IQR W /1.34), 

where s w and IQR W are the sample standard deviation and sample intcr-quartile 
range. We compute the weighted sample mean and sample variance by 

n 

y^^w(Xi,Zi)Xi, 
1=1 

n 

^ W (A 4 ,Z 4 )(A 2 -^) 2 . 

i=l 

Let X(i), X(2), ■ ■ ■ ^(n) be the order statistics of X\, X2, ■ ■ ■ , X n and W\ , W2 , • • • -w, 
be the corresponding weights of the order statistics. We find two integers q\ and 
92 such that 

91 

Wi < .25 and 

i=l 

Wi < .75 and 

i=l 



91 + 1 

Wi > .25, 

i=l 

92 + 1 

y~] Wi > .75. 

i=i 



imsart-ejs ver. 2007/04/13 file: wkde3.tex date: February 1, 2008 



B. Wang and X. Wang/Bandwidth Selection for Weighted Kernel Density Estimation 4 



Let 

<?l 92 

Pi = .25 — W{ and P2 = -75 — itfj. 

i=l i=l 

Based on the weighted sample, we can compute the first and third quartiles and 
IQR W by 

Qlw = -^"(gi) + Pl(^"(gi+1) - X (qi))i 
Q?,w = -X"(g 2 ) +P2pf(g 2 +1) - ^(g 2 ))) 

and 



Rough approach 2: In survival data analysis, the exponential reference 
density is widely used in estimating J f" 2 . Let 

f(x) = 1/A • exp(-x/A), 

We have 

f'{xfdx = e-^dx = Jjj. (10) 

Plug- in ([TO]) to ©, we get 

/i opt = 7r- 1/10 An- 1/5 = .892An" 1/5 . (11) 

The coemcient in (jlll) is close to 0.9 as in ©. We then get another rough 
estimate of the optimal bandwidth, h e , by keeping the form of ©, 

h e = 0.9Bn- 1/5 , (12) 

where 

B = min(A,/Qi? tu /1.34), 

and A can be estimated by a maximum likelihood estimate A = ^2 X^j ^ Aj for 
the right-censored data and A = X for the complete data. 



2.2. Plug-in estimator 



For complete equal weighted data, in stead of using normal or exponential 
reference densities, a plug-in method can be used to automatically select the 
bandwidth by estimating J f" 2 in ([6]) with another kernel based estimate as in 
[T8l [22] . However, for weighted samples or incomplete data such as the right- 
censored data, we won't be able to get a close form of the bandwidth h by 
minimizing the first two terms of the usual asymptotic expansion of its mean 
integrated squared error, 

AMISE(h) = {h)- l R{K) + h A a A k R{f")/A. 
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So the direct plug-in method won't work here. Even we can view the right- 
censored data as a data set of the same size with invisible data points, we will 
have trouble to compute the Sd(-) and Td(.) in Sheather and Jones (1991) 
(page 689) [22]. 

In this study, we compute h p by using the direct plug-in method bandwidth 
selector for un- weighted data and plug- in it to (j4|). The performance of h n and 
h e will be compared to h p in Section Simulation results show that h p works 
well for right-censored data although the plug-in method does not applicable 
for weighted samples. 



2.3. Least-squares cross-validation 



To select the bandwidth completely automatically, we apply the least-squares 
cross-validation (LSCV) method after we computed the bandwidth by ((9| or 
(fl2|) . The LSCV optimal bandwidth will be found by finding an h in the nearby 
regions of the rough estimate(s) that minimizes the intcrgrated square error 
(ISE) of /, 

ISE(f) = /(/ - ff = / f 2 -2 f ff+ f f. (13) 



Note that the third term to the right-hand-side does not depend on the data. 
So minimizing ISE in (|13p is equivalent to minimizing the sum of the first two 
terms. The first term can be computed by using / in For simplicity, we 
denote w(Xi, Zi) as w, and we have 

* J 

t—x/h 1 v v i I X<i \ / X 



The integral part is convolution of the kernel with itself. Due to the fact that the 
convolution of two Gaussians is another Gaussian, by elementary manipulations, 
we obtain 



WiWjK 



Xi — Xj 



V2h 



The second term in (fT3"|) can be estimated by — 2/n ^2 f-i(Xi), where is 
a Jackknife estimate which is constructed based on the data set by leaving the 
i-th point out of computing, 
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We can further simplify the second term by, 



71 ' 



n ^ 

n 

i 

-Y 



y. ! :,'r,hi,[.r .V, 
f{X t ) - w t K h (0) 



E. 

f(Xi) - w t /V2n 



E 



If all weights sum up to 1, we can compute Ej^i w j by 1 — Wi- However, for 
censored survival data, the weights may not necessary be sum up to one. Here 
we simply leave the denominator as is. 

Thus, the LSCV optimal bandwidth, hi scv , can be found by minimizing 



Y Y w * w J K v2h i x i - X i)--Y 



f(Xi) - Wi /V2tt 



1 - Wi 



(15) 



Algorithm 2.1 To find hi scv , we do the following grid search, 

Step 1; First search on [hi,h u ], where hi — 0.25ft- and h u = 1.5th, and h 

is computed by either l§j) or US\). 

Step 2; Second, expand the searching as below: 

a. If the minimum occurred at the left edge, let h' u = hi + 5 and h\ = 
0.2h' u and repeat the grid search in Step 1 over [h'[,h' u ]. The S is an 
increment in each step of searching. It should not be too small when 
the sample size is reasonably large due to that it will tremendously 
slow down the algorithm. 

b. If the minimum occurred at the right edge, let h[ = h u — 5 and h' u — 
5ftJ and repeat the grid search in Step 1. 

c. If the minimum occurred not at the edge, let h[ = (hi + h m i n )/2 and 
h' u = (h u + h m i n )/2 and repeat the grid search in Step 1. 

Step 3: Repeat Step 1 and Step 2 for k times. 

When we expand the searching in Step 2, we prefer that the new searching 
range, [h[, h' u ], has a small overlap with the old one, [hi, h u ], in case that hi scv 
stays within [hi, hi + e] or [h u — e, h u ] for small e. As for how to select k, it 
depends on the value of S. We suggest a large k such as 5 or 6 and a large 5, 
say, 5 = (h u — hi)/ 20. We need also take the sample size into consideration. If 
the sample size is large, the optimal bandwidth tends to be small and we will 
prefer to adjust the coefficients in Step 2a and Step 2b such that the searching 
will be focused on the left side of the intervals; otherwise, on the right side. 
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3. Adaptive Weighted Kernel Density Estimator 

One of the drawbacks of using a fixed bandwidth over the whole range of the 
sample data is that for long-tailed distributions that either the details where 
the data are dense will be masked or spurious noise will show in the tails. 
For weighted samples, the shape of the estimated density will be greatly af- 
fected by the weighting function. For example, the MP estimator will correct 
the estimation bias in the following way: (1) all censored survival times will fi- 
nally be excluded from constructing the density estimate. (2) starting from the 
smallest observed time, the weight of a censored data point U will be equally 
re-distributed to the data points with t > U. As a result, the largest uncensored 
data point will gain more weight from the censored data points and could cause 
bump at the right tail of the estimated density. 

To illustrate the above assertion, we assume the true survival times are 

16, 17, 19, 20, 21, 22, 24, 25, 28, 35. 

A standard kernel density estimate was computed based on the above data by 
using the Gaussian kernel and the bandwidth was selected with the direct plug- 
in method (built-in R package KernSmooth 2.22 [27]). The estimated curve was 
shown as the solid curve in Figure [T] Now, if the 9th observation (T = 28) was 




10 15 20 25 30 35 40 

' ime 



Fig 1. Density estimate of righted- censored data 

censored and the censored value is, say, X = 26. The estimated density curve by 
the MP estimator shows an obvious bimodal pattern (dash-dotted curve). The 
dashed curve in Figure Q] is the estimated density curve by using the standard 
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kernel density estimator as in ([2|), which treats the censored time as the true 
survival time. If the survival times follow distributions with long tails such as 
exponential or Wcibull distribution, the uncensored observations in regions near 
the right tail will gain extra weights from the censored data points with small 
values and more likely spurious noise will show up in the estimated density 
curve. Following the idea of the conventional adaptive kernel estimator from 
[21] . we propose an adaptive weighted kernel density estimator (awKDE). The 
key idea of the adaptive method is to use shorter bandwidth in regions where 
the data are dense and use wider bandwidth in regions with sparse data. 

Algorithm 3.1 The following algorithm shows how to construct the adaptive 
esimates: 

Step 1: Find a pilot estimate f by ^ with bandwidth h n , h e or h p . 
Step 2: Define local bandwidth factor Xi by 

A, = {f(Xi)/g}- a , (16) 

where 

logg = n- 1 J2logf(X i ) 

and the sensitivity parameter a satisfies < a < 1 . 
Step 3: Define the adaptive kernel estimate f by 

m=±HX,,Z i )±K(L^i). (17) 

i— 1 N ' 

Literature shows that the pilot estimate in Step 1 is not that crucial [11 HI [21]. 
The performance of the awKDE and the choice of a will be studied via Monte- 
Carlo method in Section [5] 



4. Boundary Problem 

It is often the case that the natural domain of definition of a density to be 
estimated is an interval bounded on one or two sides. In [8], the per capita 
GDP mentioned above are measurements of positive quantities. In survival data 
analysis, the survival times will never be negative. There could also exist an 
upper bound in some other cases. 

If there are not many observations near zero, one possible solution is to 
calculate the estimate as if there is no restriction and then set f{x) to zero for 
negative x. Normalizing can also be done to ensure the estimate integrate to 
unity. Another remedy is to do the log-transformation to the data on the half- 
line and compute the estimate, then transform back to the original scale. This 
method could be useful, but the smoothness could be a potential problem: the 
smoothness is guaranteed for the transformed data by selecting an appropriate 
bandwidth, but not for the data at the original scale. Sun and Wang [25] showed 
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that the transformation based kernel density estimate sometimes is less smooth 
for the transformation X t — g(X) — X e+1 , where 9 > 0. 

Asymmetric kernels, such as inverse Gaussian, reciprocal inverse Gaussian 
and gamma- type kernels, were also considered to eliminate the difficulty of the 
kernel density estimation around the origin for censored data[51 HU [T3]. Muller 
and Wang (1994) defined a class of boundary kernels and proposed to reduce 
the boundary effects by using boundary kernels in the boundary regions and 
varying bandwidth under minimum mean squared errors criteria [17]. 

In this study, the reflection method and replication method is adopted to 
solve the boundary problem^]. By adding reflections of all points in the data, 
we get a new data set {Xi, —Xi,X 2 , —X 2 , ■ ■ ■ X n , — X n }. Let /' be the kernel 
density estimate constructed based on the new data set. We can show that the 
density of the original data set can be computed by 



Of course we need not reflect all data points. Because a point stays 4tr away from 
x will contribute very little to the density at x, we reflect points Xi 6 [0, Ah) 
for i = 1, 2, . . . , n. The new weighted density estimator can be rewritten as 



where /(.) is an indication function. 
5. Simulation 

Simulation results will be presented in two parts. In part 1, we will illustrate 
the performance of hi scv and the adaptive bandwidths. In part 2, we will show 
the performance of h ni h e and h p . 

5.1. Part 1: LSCV and awKDE 

Expereients were done to illustrate the performance of hi scv and the adaptive 
bandwidth. The following algorithm was used. 

Step 1: draw a random sample from the targeted population; 
Step 2: compute h n and h p based on the sample; 
Step 3: search hi scv based on h n ] 

Step 4: compute the wKDEs with h n , h p and hi scv respectively and the 
corresponding L\ distances. 

Step 5: take sensitivity parameter a = 0.3,0.4,0.5,0.6,0.7 and compute 
the pilot estimates of / by wKDE with h n , hi scv and h p respectively; 
compute the corresponding Li distances. 




for x > 0, 
for x < 0. 



n 



f{x) = «>{Xi, Zi) [Kh{x - Xi) + K h (x + Xi) ■ 7 < B< 4fc] , 



(18) 
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We used two different population distributions: N(l 3, 3 2 ) and Weibull(2, 1) and 
took different sample sizes n — 20, 30, 50, f 00, 300. For each pair of n and /, the 
above procedure was repeated for 10000 times and the mean and standard error 
of the L\ distances were computed. 

5.1.1. Complete Data 

Simple random samples were drawn from iV(13, 3 2 ). All data points in the sam- 
ples were equally weighted, under which all wKDE estimates reduce to the 
standard kernel density estimate. 

Table Q] shows the simulation results. The two values to the left are the mean 



Table 1 

Complete data from Normal distribution 



Size 


h v 


awKDE 


h n 


awKDE 




awKDE 


20 


.274 


.277 


.281 


.288 


.320 


.315 


(o-3) 


1.143 


1.180 


1.111 


1.142 


1.611 


1.664 


30 


.235 


.236 


.232 


.238 


.273 


.267 


(o-3) 


.923 


.955 


.866 


.892 


1.276 


1.335 


50 


.194 


.193 


.192 


.197 


.225 


.219 


(o-3) 


.729 


.746 


.676 


.692 


.997 


1.054 


100 


.148 


.147 


.147 


.151 


.170 


.166 


(o-3) 


.523 


.528 


.484 


.492 


.710 


.754 


300 


.097 


.095 


.097 


.099 


.108 


.105 


(o-3) 


.306 


.298 


.282 


.281 


.403 


.425 



L\ distance of the wKDE by using either h p or h n or hi scv respectively (top) 
and its standard error (bottom). While the values to the right are those of the 
awKDE by using the corresponding bandwidth estimates. For each setting, only 
the best result for different a is displayed. In all cases, simulation results suggest 
a = 0.3. 

From Table [I] we find that the mean L\ distances and the standard errors 
decrease as n increases. The performance of h n and h p is similar: both outper- 
form hi scv . The awKDE improves the estimate for hi scv . For h n and h p , the 
awKDE does not improve the estimates. It improves the estimate for h p a little 
bit when n is large, and makes the estimate worse for h n . 

5.1.2. Incomplete Data 

We drew random samples from both -/V(13, 3 2 ) and Weibull(2, 1), with 30% of 
the data points randomly right-censored. The weighting function w(.) is taken 
to be the jump sizes of the Kaplan-Meier estimator as MP estimator. Simulation 
results are listed in Tableland Table The sample sizes in the two tables are 
the sizes of the original data before censoring. We took larger sample sizes such 
that we had approximately the same amount of uncensored data points, as in 
Table [T] in computing the kernel density estimates. We can find that hi scv does 
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Table 2 

Incomplete data from Normal population 



Size 


hp 


awKDE 


h„ 


awKDE 


^Iscv 


awKDE 


30 


.269 


.271 


.265 


.272 


.312 


.306 




1.112 


1.156 


1.048 


1.084 


1.535 


1.593 


40 


.240 


.242 


.239 


.245 


.278 


.272 


(c-3) 


.964 


1.005 


.908 


.942 


1.320 


1.382 


70 


.197 


.198 


.196 


.202 


.226 


.221 


(e-3) 


.742 


.763 


.694 


.716 


1.004 


1.055 


140 


.156 


.155 


.156 


.160 


.177 


.173 


(c-3) 


.551 


.563 


.517 


.530 


.717 


.759 


300 


.120 


.121 


.121 


.125 


.134 


.132 




.394 


.399 


.374 


.379 


.491 


.522 








Table 3 








Incomplete data from 


Weibull population 




Size 


hp 


awKDE 


h n 


awKDE 




awKDE 


30 


.297 


.301 


.296 


.302 


.413 


.414 


(e-3) 


1.149 


1.190 


1.124 


1.158 


1.134 


1.324 


40 


.269 


.272 


.268 


.273 


.398 


.397 


(e-3) 


.993 


1.025 


.985 


1.011 


1.298 


1.262 


70 


.230 


.231 


.229 


.232 


.373 


.371 


(e-3) 


.797 


.811 


.790 


.800 


1.266 


1.212 


140 


.194 


.194 


.193 


.194 


.353 


.347 


(e-3) 


.606 


.609 


.603 


.605 


1.339 


1.280 


300 


.168 


.168 


.166 


.167 


.330 


.323 


(e-3) 


.465 


.460 


.462 


.459 


1.471 


1.413 



not work as well as h n and h p for data from both populations. The adaptive 
method improves the estimate with hi scv , but not those with h p and h n . 

Remarks: (a) The hi scv does not work well. This is consistent to the conclu- 
sion in [2j , where Altman and Leger suggested plug- in estimator instead of using 
leave-one-out or leave-some-out method to seek optimal bandwidth. For right- 
censored data, the reweighting scheme will compromise the sparseness of data 
at the right tail and the adaptive method won't work as well as expected, (b) 
The rough approach h n outperforms the other two methods (in some settings, 
its performance is very similar to h p ). 



5.2. Part 2: Performance of h n , h e and h p 



In this part, we studied the performance of h n , h e and h p by comparing with 
an existing estimator by Kuhn and Padgett (1997, KP estimator hereafter) |12j. 
The KP estimator is an estimator proposed for survival data subject to random 
right-censoring which selects the bandwidth locally by minimizing a mean ab- 
solute error, which is supposed to be more nature than the mean squared error 
criteria [6]. The optimal bandwidth used by KP estimator is 



_ f Aa?f{x)R{K) ^ ' '■ 
kp[ ' ~ \n^ 2 f"{x) 2 H*{x) 



(19) 
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where a — 0.4809489. When a Gaussian kernel is used, we have R(K) — 
(2,/tt)~ 1 and ^ = 1- The censoring survival function, H*(x), is estimated 
by the product-limit (PL) estimator, H*(x) — 1 — H(x), where 

f 1, ( _ a 0<a:<X (1) , 

H(x) = l nt=i (tStt) 1 A< - X {k _ 1) <x<X (k) ,k = 2,...,n, 
I 0, x > X {n) . 

Here Xn\, • • • , Xr n ) are the order statistics of Xi, ■ ■ ■ , X„ . An exponential refer- 
ence density, fn(x) = A -1 exp(— x/X), is preferred, where A is estimated by the 
maximum likelihood estimate 

V™ A." 

Thus we have 

/Up (a:) = 0.7644174 • \H* {x)- 1 ' b e x ' 5 ' x n - 1 ' b (20) 
and we can express the KP estimate by 

Random samples were drawn from three different distributions: (a) normal 
distribution with mean 13 and variance 9, (b) exponential distribution with 
mean 1, and (c) Weibull distribution with shape parameter 2 and scale param- 
eter 1. For each sample, approximately 30% of the data points were randomly 
right-censored. Based on the censored data together with the censoring informa- 
tion (A), we estimated the density by wKDE with different bandwidths h n , h e , 
hp and h kp respectively. The L\ distances were computed and shown in Tabled] 
through Table [5] together with the corresponding standard errors. 



Table 4 
7V(13,3 2 ) 



Size 




h n 


he 


h p 


30 


.741 


.283 


.276 


.267 


(e-3) 


.590 


1.076 


1.105 


1.095 


50 


.710 


.234 


.230 


.224 


(e-3) 


.501 


.828 


.845 


.887 


100 


.654 


.180 


.178 


.175 


(e-3) 


.381 


.601 


.607 


.665 


200 


.586 


.143 


.142 


.139 


(e-3) 


.284 


.442 


.445 


.492 



From Table SI it can be found that h n , h e and h p all work well for censored 
data from the normal population, while h p outperforms the other three methods. 
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Table 5 








Exponential (1) 




Size 


^kp 


h n 


h e 


hp 


30 


.263 


.335 


.334 


.382 


(e-3) 


1.200 


1.287 


1.291 


1.207 


50 


.243 


.301 


.301 


.345 


(e-3) 


1.004 


1.032 


1.037 


.941 


100 


.223 


.264 


.264 


.302 


(e-3) 


.793 


.777 


.779 


.704 


200 


.207 


.236 


.237 


.267 


(e-3) 


.614 


.592 


.590 


.532 






Table 6 








Weibull(2, 1) 




Size 


^kp 


h n 


h e 


hp 


30 


.344 


.299 


.296 


.289 


(e-3) 


.843 


1.128 


1.142 


1.104 


50 


.321 


.257 


.257 


.251 


(e-3) 


.740 


.905 


.919 


.890 


100 


.284 


.212 


.213 


.210 


(e-3) 


.603 


.676 


.688 


.676 


200 


.248 


.181 


.182 


.180 


(e-3) 


.476 


.524 


.531 


.524 



The estimator h n has a relatively larger mean L\ distances and the smallest 
standard errors. The KP estimator does not work well. This may be due to the 
fact that we used a normal density while KP estimator assume an exponential 
reference density. Though, the performance of h e was not affected much because 
we took the minimum of s w and IQR W /1.34: in (fl2|) . As shown in Table El 
when / is actually an exponential p.d.f., the KP estimator outperforms the 
other three methods as expected. This is not surprising because it uses more 
(correct) information than the others. The performance of h n and h e are alike, 
both outperform h p . In Table [6l when a Weibull population was used, h p is the 
winner. Both h n and h e also work well. When n increases, their performance 
becomes very similar. 

6. Applications 

6.1. Density Estimation from Biased Sampling 

The wKDE can be used to estimate the densities based on biased samples. In 
biased sampling, if whether an element with X — x will be observed depends 
on its true value x, we obtain a biased sample. Let's assume that Xi = Xi, will 
be sampled with probability b(Xi). Let f{x) be the population density, we can 
show that the density of the biased sample is a weighted version of /(x), 

f s {x) = b(x)f(x)/K, (22) 
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where k is a normalizing constant such that 

b(x)f(x)dx. 



Both b(.) and /(.) in (|22[) are non-parametrically identifiable if two or more 
random samples with overlaps are available |28j . However, based on just one 
sample, we need further restrictions on either b(.) or /(.) or both to ensure the 
idcntifiability |26[ 17] . Throughout this paper, we assume &(.) and further w(.) to 
be parametrically known. 

We can estimate f s (x) by a standard kernel density estimator as in @ and 
therefore obtain a natural estimate of f(x), 

n 

f b (x) = K f s (x)/b(x) = — jL-y K h& ~ *<)■ (23) 

^ * i— 1 

Wu (1997) proposed to estimate f(x) for s-dimension data by a kernel density 
estimator [3D]. We simply take s = 1 and get its univariate version estimate, 

n 

Uu(x) = K'^b-^X^Khix - Xi), (24) 

i=l 

where k' = 

Which estimate is better, fb{x) or f wu (x)? In ([2^]) . if the biasing function b(x) 
is coarse or not continuous, the estimate in (|2"3"|) may also be coarse. While in 
([24"]) , the estimate is smooth. A Monte Carlo study was carried out to compare 
their performance in density estimation based on biased samples. We first drew a 
random sample, X , of size 200 from a targeted population; second, we mimicked 
the biased sampling scheme by keeping observation X = x in the data set with 
probability b(x); finally, we computed fb and /„ based on the biased samples. 
To evaluate the performance, the L\ distance between / and / is computed, 

/m 
\f-f\ ^X^foO-ZCw)! -cU, (25) 
»=1 

where < yi < y 2 < ■ ■ ■ < y m and d. t = (y i+1 - Vi-ij/2 for i = 2, 3, . . . , m - 1 
and di=y 2 - yi, d rn = y m - y m -i- 

We took two targeted populations: (a) Weibull distribution with shape pa- 
rameter 2 and scale parameter 1; and (b) normal distribution with mean 10 
and standard deviation 2. Two different biasing function were used for biased 
sampling, 

bi(x) oc x 
b 2 (x) = 



' 0.2 


if x < pb — 1.2a, 






0.4 


if H — 1.2cr < x < 




- 0.4a, 


< 0.6 


if pi - 0.4cr < x < 




- 0.4(7, 


0.8 


if n + 0.4cr < x < 


M" 


- 1.2(7, 


1.0 


if X > /! + 1.2(7. 
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We repeated the above procedure for 10000 times for each setting and approx- 
imate the mean L\ distance and the standard error. The results are shown in 
Table [Jj We find in all the four scenarios, f wu outperforms 



Table 7 

Performance of wKDE and KDE for biased samples 







N(10,2) 


WeibuU(2,l) 


K) 




mean 


se 


mean 


se 


MO 


h 


.130 


4.37e-4 


.167 


6.51e-4 




fwu 


.127 


4.33e-4 


.150 


6.41e-4 


MO 


A 


.194 


4.43e-4 


.221 


5.80e-4 




fwu 


.145 


5.36e-4 


.167 


5.76e-4 



In Figure^ the Weibull distribution was used in plot (a) and (c), and the 





Fig 2. Kernel density estimate of length biased data 



normal distribution was used in plot (b) and (d). In Figure [21 the solid curves 
show the true density curves of /. The dashed curves and dotted curves represent 
the estimated density curves by f wu and /& respectively. In plot (a) and (b), we 
find that when b(x) is smooth, both fa and f wu are smooth. The two estimators 
work similarly well except that /& has a boundary problem due to that b(x) — > 
when x — > 0. The results in Table [7] also demostrate that the difference between 
the two mean Li-distances of /& and f wu is not very large. However, in plot (c) 
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and (d), we can find that when the biasing function is a step function, f wu is 
still smooth, but fb is not. The difference between the two mean Li-distances 
also becomes larger. In conclusion, f wu has better performance than /;,. 

6.2. Density Estimation from Informative Censoring 

In a clinical trial by the Eastern Cooperative Oncology Group, the survival times 
of 61 patients with inoperable carcinoma of the lung, progressions of which are 
usually associated with shortened residual lifetime, who were treated with the 
drug cyclophosphamide were collected |14) . Among the 61 patients, 33 died and 
their survival times were observed and listed in Table [8] Table [9] lists the other 
28 censored observations representing patients whose treatment was terminated, 
at the times indicated, because of the appearance of metastatic disease or a 
significant increase in the size of their primary lesion. The eventually failure 
times of the 28 censored patients were also collected (in parentheses) by a follow- 
up study. These 28 failure times were contaminated due to that those patients 
received other therapies thought to be more beneficial than cyclophosphamide 
after they were removed from the study and their ultimate survival times are 
possibly slightly better than what they would have been if they were kept on 
study and continued on cyclophosphamide. 

Table 8 
Observed deaths 

0.43, 2.86, 3.14, 3.14, 3.43, 3.43, 3.71, 3.86, 6.14, 6.86, 9.00, 9.43, 10.71, 10.86, 11.14, 13.00, 
14.43, 15.71, 18.43, 18.57, 20.71, 29.14, 29.71, 40.57, 48.57, 49.43, 53.86, 61.86, 66.57, 68.71, 

68.96, 72.86, 72.86 

Table 9 

Observed Censored times (ultimate survival times) 

0.14(3.00), 0.14(12.43), 0.29(1.14), 0.43(17.14), 0.57(4.43), 0.57(5.43), 1.86(12.14), 
3.00(7.86), 3.00(13.86), 3.29(10.57), 3.29(34.43), 6.00(7.86), 6.00(38.00), 6.14(9.29), 
8.17(20.43), 10.57(25.00), 11.86(17.29), 15.57(51.57), 16.57(45.00), 17.29(24.14), 
18.71(29.43), 21.29(26.71), 23.86(29.00), 26.00(53.86), 27.57(49.71), 32.14(63.86), 
33.14(99.00), 47.29(48.71) 

Let T be the true survival times, T c be the censored times and T u be the 
ultimate survival times. We compute the observed residual lifetime of the 28 
patients by T r = T u — T c . The T r s contain the carry-over effects by treatments 
other than cyclophosphamide. Plot (a) of Figure [3] shows the relationship be- 
tween log(T r /T c ) and T c . The point marked with "X" is a potential outlier and 
we leave it out. A smoothed curve was fitted to the data shown in plot (a) by 
the following model, 

log(T rl /T c ) = S{T C ) + e, (26) 

where S(.) is a smooth function to be estimated. The fitted curve (solid line) 
shows a curvilinear pattern: the ratio of T r and T c was large for removed patients 
who stayed in the system for either shorter or longer period of times. The upper 
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(a) (b) 




10 20 30 40 10 20 30 40 

Time Censored limes 



Fig 3. Lung data 



and lower dotted curves were added at 2 standard errors above and below the 
estimate of the smooth. Plot (b) gives a scatter plot of T r on T c . A least squares 
line was added to the plot by fitting a simple linear regression model, 

T r2 = [3 + (3xT c + e. (27) 

The slope was computed to be j3\ = .4662, which is significantly different from 
zero with p-value= 0.03478. Two potential outliers were marked and excluded 
from computation. From the above two plots, we can see there do exist some 
relationship pattern between T r and T c . 

Although T r can not be used to replace the true residual times, it could pro- 
vide useful information about the true survival times. This type of information 
can be used to adjust the weighting function in wKDE to improve the estimate. 
Due to that the other therapies were thought to be more beneficial than cy- 
clophosphamide, it is reasonable for us to believe that a removed patient was 
more likely to die no later than the observed ultimate survival time, T < T u . 
In stead of reassigning the weight belongs to a censored data point Xj equally 
to all points thereafter, we can assign its weight to all points fall between Tf 
and Tf + T[ . If no observation lies inside the above range, we assign the weight 
of Ti to the observation which stays the closest to the upper boundary. The 
MP estimator will not work because we know that the true residual times were 
less likely to be larger than T u -T c . In this study, the two f r 's by and 
(|27|) lead to exactly the same kernel density estimate, which is shown as the 
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solid curve in Figure H] (labeled as "lm/gam"). In Figure 0J the dashed curve is 




20 40 60 80 

Time 



Fig 4. Density estimates 

the estimated density curve by taking T r = 0, where all removed patients were 
assumed to die right after T c given no other therapies were provided. While the 
dotted curve is the one by assuming the residual times are exactly the ultimate 
observed survival times. The dash-dotted curve was estimated by assuming ran- 
dom right censoring and computed by MP estimator. The survival functions 
were also computed and plotted in figure [5] based on the above different esti- 
mates of /. Obviously, the MP estimate is too optimistic and under-estimatcs 
the risk (dash-dotted curve). The dashed curve provides an estimate of the sur- 
vival function by assuming the dropout patients die right after the censored 
times. It over-estimates the risk and seems to be too pessimistic. By reassigning 
the weights to points in a neighborhood of the censored data points, we obtain 
the survival function as the solid curve in figure This curve is very close to 
the one by assuming the true survival times are the ultimate survival times. 

7. Summary 

In summary, the two rough approaches and the plug-in method work well in 
bandwidth selection for wKDE. If the target distribution is a exponential- like 
distribution, the KP estimator is also a good choice. The LSCV method and the 
adaptive estimator won't improve the estimate for wKDE. For large samples, 
Fourier transform or fast Fourier transform and kernels such as Epanechnikov 
kernel can be considered, which could remarkably improve the computational 

speed mi nam. 
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Time 



Fig 5. Survival Function Estimates 

By choosing an appropriate weighting function, the wKDE can be used to 
robustly and efficiently estimate the densities from survival data subject to ran- 
dom censoring. The situation for data subject to informative censoring is much 
complicated, because the it's hard to model the sampling scheme. A possible 
solution is try to classify the censored data points into several categories and use 
the prior information we have to define different weight-redistribution schemes, 
and finally apply wKDE to estimate the density. Alternatively, instead of as- 
signing weights to values after the censoring times equally or only to points in 
certain neighborhoods, we can also consider impute the censored times and re- 
assign weights to points in the nearby regions. When covariates are available, a 
parametric model or quantile regression could be more efficient. Further studies 
will be carried out and the results will be presented in another research paper. 

All algorithms have been implemented in R and C, and are available on 
request. 

Acknowledgement: special thanks go to Dr. Jiayang Sun her working group 
for many good suggestions and comments. 
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